Introduction

The following datasets are used in this demonstration:

Table of contents

  1. Population and household estimates (univariate)

    1.1 Gender ratio

    1.2 Gender by region

    1.3 Population by outward postcode using choropleth map

    1.4 Population by postcode parent area using choropleth map

    1.5 Male/Female ratio by postcode parent area using choropleth map

  2. House prices and earnings (multivariate)

    2.1 Correlation between median house price and median household earning

    2.2 Median and lower quartile house prices from 2002 to 2020 (animated)

  3. Text analysis and visualisation

    3.1 Text tokenization and word cloud

    3.2 Basic sentiment analysis with VADER


Preparation

Please run this code cell to install required packages. If this does not work, please install these packages manually in your virtual environment.

Import packages to be used and replace default plotting backend.

1. Population and household estimates

1.1 Gender ratio

1.2 Population and gender ratio by region

1.3 Population by outward postcode using choropleth map

Concat individual postcode geojson mapping into single variable geojson_uk.

Plot population with postcode area. Please note that as the opensource geographical data I used in this demonstration comes from Wikipedia, it does not cover all England regions. This leads to white areas on the map.

1.4 Population by postcode parent area using choropleth map

In the above plot, regions are probably too granulated, so you would not be able to see an obvious trend. Let's merge them into parent regions.

Now we have merged

2. House prices and earnings

Define a reusable function to read from the source file and process into a DataFrame in required format.

Read dataset into DataFrame

2.1 Correlation between median house price and median household earning

2.2 Median and lower quartile house prices from 2002 to 2020 (animated)

3. Text based analysis and visualisation

3.1 Text tokenization and word cloud

3.2 Basic sentiment analysis with VADER

We can use nltk's built-in VADER model to perform a basic sentiment analysis on these tweets without the need for training.